figure 3
LimitstoDepth-EfficienciesofSelf-Attention
Self-attention architectures, which are rapidly pushing the frontier innatural language processing, demonstrate asurprising depth-inefficient behavior: previous works indicate that increasing the internal representation (network width) isjust as useful as increasing the number of self-attention layers (network depth).
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
6cfe0e6127fa25df2a0ef2ae1067d915-Paper.pdf
However,maximum-marginclassifiers areinherently robusttoperturbations ofdata at prediction time, and this implication is at odds with concrete evidence that neural networks, in practice, are brittle toadversarial examples [71]and distribution shifts [52,58,44,65]. Hence, the linear setting, while convenient to analyze, is insufficient to capture the non-robustness of neural networkstrainedonrealdatasets.Goingbeyondthelinearsetting,severalworks[ 1,49,74]arguethat neuralnetworksgeneralize wellbecause standard training procedures haveabiastowardslearning
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > United States > California (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (6 more...)
- Workflow (0.68)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)